Variable selection in discriminant partial least-squares analysis.

نویسندگان

  • B K Alsberg
  • D B Kell
  • R Goodacre
چکیده

Variable selection enhances the understanding and interpretability of multivariate classification models. A new chemometric method based on the selection of the most important variables in discriminant partial least-squares (VS-DPLS) analysis is described. The suggested method is a simple extension of DPLS where a small number of elements in the weight vector w is retained for each factor. The optimal number of DPLS factors is determined by cross-validation. The new algorithm is applied to four different high-dimensional spectral data sets with excellent results. Spectral profiles from Fourier transform infrared spectroscopy and pyrolysis mass spectrometry are used. To investigate the uniqueness of the selected variables an iterative VS-DPLS procedure is performed. At each iteration, the previously found selected variables are removed to see if a new VS-DPLS classification model can be constructed using a different set of variables. In this manner, it is possible to determine regions rather than individual variables that are important for a successful classification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chemometric Studies for Quality Control of Processed Brazilian Coffees Using Drifts

In this work, the potential of mid-infrared diffuse reflectance spectroscopy with Fourier transform for discrimination of 29 commercial Brazilian coffee samples with different industrial processing, i.e., caffeine extraction and roasting degree, was evaluated. The statistical treatments applied to pretreated spectral data were principal component analysis and partial least squares – discriminan...

متن کامل

Multivariate Classifi cation for Qualitative Analysis

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Principles of classifi cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 The classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Main categories of classifi cation methods ...

متن کامل

Soil type recognition as improved by genetic algorithm-based variable selection using near infrared spectroscopy and partial least squares discriminant analysis

Soil types have traditionally been determined by soil physical and chemical properties, diagnostic horizons and pedogenic processes based on a given classification system. This is a laborious and time consuming process. Near infrared (NIR) spectroscopy can comprehensively characterize soil properties, and may provide a viable alternative method for soil type recognition. Here, we presented a pa...

متن کامل

Title: Using machine learning methods to predict experimental high- throughput screening data

High-throughput screening (HTS) remains a very costly process notwithstanding many recent technological advances in the field of biotechnology. In this study we consider the application of machine learning methods for predicting experimental HTS measurements. Such a virtual HTS analysis can be based on the results of real HTS campaigns carried out with similar compounds libraries and similar dr...

متن کامل

Chemometric Feature Selection and Classification of Ganoderma lucidum Spores and Fruiting Body Using ATR-FTIR Spectroscopy

Ganoderma lucidum (G. lucidum) spores as a valuable Chinese herbal medicine have vast marketable prospect for its bioactivities and medicinal efficacy. This study aims at the development of an effective and simple analytical method to distinguish G. lucidum spores from its fruiting body, which is of essential importance for the quality control and fast discrimination of raw materials of Chinese...

متن کامل

Variable Selection and Parameter Tuning in High-Dimensional Prediction

In the context of classification using high-dimensional data such as microarray gene expression data, it is often useful to perform preliminary variable selection. For example, the k-nearest-neighbors classification procedure yields a much higher accuracy when applied on variables with high discriminatory power. Typical (univariate) variable selection methods for binary classification are, e.g....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Analytical chemistry

دوره 70 19  شماره 

صفحات  -

تاریخ انتشار 1998